A Novel End-to-End Turkish Text-to-Speech (TTS) System via Deep Learning
نویسندگان
چکیده
Text-to-Speech (TTS) systems have made strides but creating natural-sounding human voices remains challenging. Existing methods rely on noncomprehensive models with only one-layer nonlinear transformations, which are less effective for processing complex data such as speech, images, and video. To overcome this, deep learning (DL)-based solutions been proposed TTS require a large amount of training data. Unfortunately, there is no available corpus Turkish TTS, unlike English, has ample resources. address our study focused developing speech synthesis system using DL approach. We obtained from male speaker Tacotron 2 + HiFi-GAN structure the system. Real users rated quality synthesized 4.49 Mean Opinion Score (MOS). Additionally, MOS-Listening Quality Objective evaluated objectively, obtaining score 4.32. The waveform inference time was determined by real-time factor, 1 s in 0.92 s. best knowledge, these findings represent first documented HiFi-GAN-based TTS.
منابع مشابه
Deep Speech: Scaling up end-to-end speech recognition
We present a state-of-the-art speech recognition system developed using end-toend deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model backgro...
متن کاملAutomatic Editing in a Back-End Speech-to-Text System
Written documents created through dictation differ significantly from a true verbatim transcript of the recorded speech. This poses an obstacle in automatic dictation systems as speech recognition output needs to undergo a fair amount of editing in order to turn it into a document that complies with the customary standards. We present an approach that attempts to perform this edit from recogniz...
متن کاملRobust end-to-end deep audiovisual speech recognition
Speech is one of the most effective ways of communication among humans. Even though audio is the most common way of transmitting speech, very important information can be found in other modalities, such as vision. Vision is particularly useful when the acoustic signal is corrupted. Multi-modal speech recognition however has not yet found wide-spread use, mostly because the temporal alignment an...
متن کاملEnd-to-End Optimized Speech Coding with Deep Neural Networks
Modern compression algorithms are often the result of laborious domain-specific research; industry standards such as MP3, JPEG, and AMR-WB took years to develop and were largely hand-designed. We present a deep neural network model which optimizes all the steps of a wideband speech coding pipeline (compression, quantization, entropy coding, and decompression) end-to-end directly from raw speech...
متن کاملEnd-to-End Deep Neural Network for Automatic Speech Recognition
We investigate the efficacy of deep neural networks on speech recognition. Specifically, we implement an end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding. The system will comprise of two variants of neural networks for phoneme recognition. In particular, we utilize conv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2023
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics12081900